Chipmunk: A Systolically Scalable 0.9 mm2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference
نویسندگان
چکیده
Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. Ondevice computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voicebased human-machine interfaces. Here we present CHIPMUNK, a small (<1 mm) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple CHIPMUNK engines can cooperate to form a single systolic array. In this way, the CHIPMUNK architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed in [1], consuming less than 13 mW of average power.
منابع مشابه
Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely computeand memoryintensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this tren...
متن کاملA 910MHz Injection Locked BFSK Transceiver for Wireless Body Sensor Network Using Colpitts Oscillator
A 910MHz high efficiency RF transceiver for Wireless Body Area Network in medical application is presented in this paper. High energy efficiency transmitter and receiver architectures are proposed. In wireless body sensor network, the transmitter must have higher efficiency compared with the receiver because a large amount of data is sent from sensor node to receiver of the base station and sma...
متن کاملA Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets
Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NT...
متن کامل1.2-V, 10-bit, 60-360 MS/s time-interleaved pipelined analog-to-digital converter in 0.18 μm CMOS with minimised supply headroom
A low-voltage 1.2-V, 10-bit, 60–360 MS/s six channels time-interleaved reset-opamp pipelined ADC is designed and implemented in a 0.18-mm CMOS (VTHN/VTHP 1⁄4 0.63 V/20.65 V for mid-supply floating switches). Without using on-chip high-voltage and low-VT options, the proposed ADC employs low-voltage resistivedemultiplexing techniques, low-voltage gain-and-offset compensation, feedback current bi...
متن کاملArtificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river
ABSTRACT: In this study, adaptive neuro-fuzzy inference system, and feed forward neural network as two artificial intelligence-based models along with conventional multiple linear regression model were used to predict the multi-station modelling of dissolve oxygen concentration at the downstream of Mathura City in India. The data used are dissolved oxygen, pH, biological oxygen demand and water...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.05734 شماره
صفحات -
تاریخ انتشار 2017